92 research outputs found

    Analyzing Hyperonyms of Stack Overflow Posts

    Get PDF
    Communication among people is often a challenging task due to the different interpretations of the terms they use. The way people interpret the terms highly depends on the semantic context, where the notions were acquired. The different contexts provide somewhat distinct meanings to the terms used. In software development and integration, requirements engineering and customer support are primarily affected by the difficulties stemming from communication obstacles. The necessary information is often inadequately forwarded to developers resulting in poorly specified software requirements or misinterpreted user feedback. The communication difficulties mentioned can be solved by clarifying the meanings of the concepts used. Semantic networks built on different contexts are suitable tools for this purpose. This paper presents a formal description of the semantic network and the semantic space needed for the algorithmic treatment of the concepts. It provides a model for extracting hyperonymy and hyponymy relations from text corpora created in specific semantic domains. The model was applied on a corpus acquired from Stack Overflow containing conversations among the software developers to solve programming issues

    Mining Hypernyms Semantic Relations from Stack Overflow

    Get PDF
    Communication between a software development team and business partners is often a challenging task due to the different context of terms used in the information exchange. The various contexts in which the concepts are defined or used create slightly different semantic fields that can evolve into information and communication silos. Due to the silo effect, the necessary information is often inadequately forwarded to developers resulting in poorly specified software requirements or misinterpreted user feedback. Communication difficulties can be reduced by introducing a mapping between the semantic fields of the parties involved in the communication based on the commonly used terminologies. Our research aims to obtain a suitable semantic database in the form of a semantic network built from the Stack Overflow corpus, which can be considered to encompass the common tacit knowledge of the software development community. Terminologies used in the business world can be assigned to our semantic network, so software developers do not miss features that are not specific to their world but relevant to their clients. We present an initial experiment of mining semantic network from Stack Overflow and provide insights of the newly captured relations compared to WordNet

    Analyzing Hyperonyms of Stack Overflow Posts

    Get PDF
    Communication among people is often a challenging task due to the different interpretations of the terms they use. The way people interpret the terms highly depends on the semantic context, where the notions were acquired. The different contexts provide somewhat distinct meanings to the terms used. In software development and integration, requirements engineering and customer support are primarily affected by the difficulties stemming from communication obstacles. The necessary information is often inadequately forwarded to developers resulting in poorly specified software requirements or misinterpreted user feedback. The communication difficulties mentioned can be solved by clarifying the meanings of the concepts used. Semantic networks built on different contexts are suitable tools for this purpose. This paper presents a formal description of the semantic network and the semantic space needed for the algorithmic treatment of the concepts. It provides a model for extracting hyperonymy and hyponymy relations from text corpora created in specific semantic domains. The model was applied on a corpus acquired from Stack Overflow containing conversations among the software developers to solve programming issues

    Towards JavaScript program repair with Generative Pre-trained Transformer (GPT-2)

    Get PDF
    The goal of Automated Program Repair (APR) is to find a fix to software bugs, without human intervention. The so-called Generate and Validate (G\&V) approach deemed to be the most popular method in the last few years, where the APR tool creates a patch and it is validated against an oracle. Recent years for Natural Language Processing (NLP) were of great interest, with new pre-trained models shattering records on tasks ranging from sentiment analysis to question answering. Usually these deep learning models inspire the APR community as well. These approaches usually require a large dataset on which the model can be trained (or fine-tuned) and evaluated. The criterion to accept a patch depends on the underlying dataset, but usually the generated patch should be exactly the same as the one created by a human developer. As NLP models are more and more capable to form sentences, and the sentences will form coherent paragraphs, the APR tools are also better and better at generating syntactically and semantically correct source code. As the Generative Pre-trained Transformer (GPT) model is now available to everyone thanks to the NLP and AI research community, it can be fine-tuned to specific tasks (not necessarily on natural language). In this work we use the GPT-2 model to generate source code, to the best of our knowledge, the GPT-2 model was not used for Automated Program Repair so far. The model is fine-tuned for a specific task: it has been taught to fix JavaScript bugs automatically. To do so, we trained the model on 16863 JS code snippets, where it could learn the nature of the observed programming language. In our experiments we observed that the GPT-2 model was able to learn how to write syntactically correct source code almost on every attempt, although it failed to learn good bug-fixes in some cases. Nonetheless it was able to generate the correct fixes in most of the cases, resulting in an overall accuracy up to 17.25\%

    From C++ Refactorings to Graph Transformations

    Get PDF
    In this paper, we study a metamodel for the C++ programming language. We work out refactorings on the C++ metamodel and present the essentials as graph transformations. The refactorings are demonstrated in terms of the C++ source code and the C++ target code as well. Graph transformations allow to capture refactoring details on a conceptual and easy to understand, but also very precise level. Using this approach we managed to formalize two major aspects of refactorings: the structural changes and the preconditions

    FixJS: A Dataset of Bug-fixing JavaScript Commits

    Get PDF
    The field of Automated Program Repair (APR) has received increasing attention in recent years both from the academic world and from leading IT companies. Its main goal is to repair software bugs automatically, thus reducing the cost of development and maintenance significantly. Recent works use state-of-the-art deep learning models to predict correct patches, for these teaching on a large amount of data is inevitable almost in every scenarios. Despite this, readily accessible data on the field is very scarce. To contribute to related research, we present \emph{FixJS}, a dataset containing bug-fixing information of \textasciitilde 2 million commits. The commits were gathered from GitHub and processed locally to have both the buggy (before bug fixing commit) and fixed (after fix) version of the same program. We focused on JavaScript functions, as it is one of the most popular programming language globally and functions are first class objects there. The data includes more than 300,000 samples of such functions, including commit information, before/after states and 3 source code representations
    corecore